Comprehensive evaluation of fusion transcript detection algorithms and a meta-caller to combine top performing methods in paired-end RNA-seq data.

نویسندگان

  • Silvia Liu
  • Wei-Hsiang Tsai
  • Ying Ding
  • Rui Chen
  • Zhou Fang
  • Zhiguang Huo
  • SungHwan Kim
  • Tianzhou Ma
  • Ting-Yu Chang
  • Nolan Michael Priedigkeit
  • Adrian V Lee
  • Jianhua Luo
  • Hsei-Wei Wang
  • I-Fang Chung
  • George C Tseng
چکیده

BACKGROUND Fusion transcripts are formed by either fusion genes (DNA level) or trans-splicing events (RNA level). They have been recognized as a promising tool for diagnosing, subtyping and treating cancers. RNA-seq has become a precise and efficient standard for genome-wide screening of such aberration events. Many fusion transcript detection algorithms have been developed for paired-end RNA-seq data but their performance has not been comprehensively evaluated to guide practitioners. In this paper, we evaluated 15 popular algorithms by their precision and recall trade-off, accuracy of supporting reads and computational cost. We further combine top-performing methods for improved ensemble detection. RESULTS Fifteen fusion transcript detection tools were compared using three synthetic data sets under different coverage, read length, insert size and background noise, and three real data sets with selected experimental validations. No single method dominantly performed the best but SOAPfuse generally performed well, followed by FusionCatcher and JAFFA. We further demonstrated the potential of a meta-caller algorithm by combining top performing methods to re-prioritize candidate fusion transcripts with high confidence that can be followed by experimental validation. CONCLUSION Our result provides insightful recommendations when applying individual tool or combining top performers to identify fusion transcript candidates.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ChimeRScope: a novel alignment-free algorithm for fusion transcript prediction using paired-end RNA-Seq data

The RNA-Seq technology has revolutionized transcriptome characterization not only by accurately quantifying gene expression, but also by the identification of novel transcripts like chimeric fusion transcripts. The 'fusion' or 'chimeric' transcripts have improved the diagnosis and prognosis of several tumors, and have led to the development of novel therapeutic regimen. The fusion transcript de...

متن کامل

FusionMap: detecting fusion genes from next-generation sequencing data at base-pair resolution

MOTIVATION Next generation sequencing technology generates high-throughput data, which allows us to detect fusion genes at both transcript and genomic levels. To detect fusion genes, the current bioinformatics tools heavily rely on paired-end approaches and overlook the importance of reads that span fusion junctions. Thus there is a need to develop an efficient aligner to detect fusion events b...

متن کامل

Discovering chimeric transcripts in paired-end RNA-seq data by using EricScript

MOTIVATION The discovery of novel gene fusions can lead to a better comprehension of cancer progression and development. The emergence of deep sequencing of trancriptome, known as RNA-seq, has opened many opportunities for the identification of this class of genomic alterations, leading to the discovery of novel chimeric transcripts in melanomas, breast cancers and lymphomas. Nowadays, few comp...

متن کامل

A probabilistic framework for aligning paired-end RNA-seq data

MOTIVATION The RNA-seq paired-end read (PER) protocol samples transcript fragments longer than the sequencing capability of today's technology by sequencing just the two ends of each fragment. Deep sampling of the transcriptome using the PER protocol presents the opportunity to reconstruct the unsequenced portion of each transcript fragment using end reads from overlapping PERs, guided by the e...

متن کامل

Bellerophontes: A RNA-Seq data analysis framework for chimeric transcripts discovery based on accurate fusion model

Motivation: Next generation sequencing technology allows the detection of genomic structural variations, novel genes and transcript isoforms from the analysis of high throughput data. In this work, we propose a new framework for the detection of fusion transcripts through short paired-end reads which integrates splicing-driven alignment and abundance estimation analysis, producing a more accura...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Nucleic acids research

دوره 44 5  شماره 

صفحات  -

تاریخ انتشار 2016